;;; -*- Syntax: Common-Lisp; Package: (AUTOCLASS CL); Base: 10; Mode: TEXT -*-
;;; File: Autoclass-X:doc;checkpoint.text
;;;————————————————————————-;;;
;;; AUTOCLASS 3.0 Released 5/90 contact: Taylor@pluto.arc.nasa.gov ;;;
;;; by P. Cheeseman, J. Stutz, R. Hanson, W. Taylor ;;;
;;; NASA Ames Research Center, MS 244-17, Moffett Field, CA 94035 ;;;
;;; ;;;
;;; Copyright (C) 1990 Research Institute for Advanced Computer Science. ;;;
;;; All rights reserved. The RIACS Software Policy contains specific ;;;
;;; terms and conditions on the use of this software, and must be ;;;
;;; distributed with any copies. THIS FILE MAY BE REDISTRIBUTED. This ;;;
;;; copyright and notice must be preserved in all copies made of this file.;;;
;;;————————————————————————-;;;
;;; added 6/06/90 for 3.0.2
Checkpointing:
With very large databases there is a significant probability of a system
crash during any one classification try. Under such circumstances it is
advisable to take the time to checkpoint the calculations for possible restart.
The code modifications given at the end of this file provide for
checkpointing the current state of a classification at the end of each basic
convergence cycle. It can either be loaded after the regular AutoClass system
is loaded, or substituted for the standard definition of Base-Cycle. This
provides a modified version of Base-Cycle that checks for a value of the global
variable *checkpoint-file*. So long as *checkpoint-file* is nil, there is no
apparent change from the standard Base-Cycle. When *checkpoint-file* is a
pathname, Base-Cycle uses Save-Clsf-Seq to save a compressed version of the
classification. On Symbolics systems this uses Dump-Object-To-File to make a
binary version. This is much quicker than writing a normal ASCII file, but is
not human readable. On systems where Dump-Object-To-File is not defined, saved
files are written in ASCII. If you choose to write your own version of
Dump-Object-To-File, please send us a copy so we can pass it on. Note that
checkpointing will slow the search process, noticeably so when writing out in
ASCII. With ASCII output, successive file versions supersede the previous ones.
To recover the classification after rebooting do:
(setf clsf (first (Get-Clsf-Seq *checkpoint-file* :expand t :wts t)))
If needed, this will cause the appropriate database and models to be loaded,
provided there has been no change in their filenames since the time they were
loaded for the checkpointed classification.
Protocols:
The standard search control function, Autoclass-Search, has no facilities
for restarting from a partially converged classification. However it does
quickly find a good distribution for the number of classes to start with for
any particular combination of initialization and search. We therefore advise
that before trying to classify the full database, one apply Autoclass-Search to
several small randomly chosen fractions of the database, without using
checkpointing. This will give a good measure of the minimum number of classes
to expect, and may suggest ways to improve the model.
If there is no great chance of a system crash during a full classification,
we suggest continuing with Autoclass-Search. Use the same arguments as before,
except that :start-J-list should be something like a list of the number of
classes in the best six classifications seen so far (i.e. found during the best
of the data subset searches.) Assuming you have just completed the initial
search, this could be done by:
(setf search *) ;; grabbing the results of the initial search.
(setf j-list
(map 'list #'search-try-j-in (safe-subseq (search-tries search) 0 6)))
(autoclass-search :start-j-list j-list .....) ;; With the full data set.
If you experience, or expect, a system crash in every few classifications,
it would be better to use Find-Best-N-4 to search for good classifications.
This is a primitive version of Autoclass-Search which has a restart capability
(which means that it can begin again with a checkpointed classification after a
crash). Find-Best-N-4's search arguments are similar to those for
AutoClass-Search, but it requires an existing classification as it's primary
input. Use Generate-Clsf to make an initial classification, and the
checkpointed classification for restarts. The primary arguments to
Generate-Clsf are the pathnames for the data, header and model files. See the
specific functions for information on the omitted arguments, some of which are
required. Set up :start-j-list as described above.
Start with:
(setf *checkpoint-file* (make-pathname .....))
(setf clsf (generate-clsf :n-classes 1 :start-fn 'block-set-clsf ....))
(find-best-n-4 clsf :start-j-list j-list .....)
And restart with:
(setf clsf (first (Get-Clsf-Seq *checkpoint-file* :expand t :wts t)))
(find-best-n-4 clsf :restart t :start-j-list j-list .....)
If you find that most classifications crash, you might as well go to a fully
manual search. This starts by initializing a classification with a hopefully
appropriate number of classes. You then apply one of the search functions (see
*try-fn-list*) and keep restarting from the checkpointed version until the
search ends naturally. Alternate classifications are rated by the marginal
posterior in the log-a<X/H> field [use (clsf-
- log - a < X/H > clsf )].Ausefultactic, usedfortheIRASclassificationwithAutoClassII, istomakeanumberofstartsthatareonlyconvergedtoacoarselimitandchoosethebestoftheseforfurtherconvergence.Startwith : (setf*checkpoint - file*(make - pathname.....))(setfclsf (generate - clsf : n - classesN - CLASSES....))( < try - fn > clsf.....)Andrestartwith : (setfclsf (first(Get - Clsf - Seq*checkpoint - file* : expandt : wtst)))( < try - fn > clsf.....)Notethatinthiscaseyouwillhavetomanagethesearchforthebestnumberofclassesyourself, bychoosingN - CLASSESyourselfeachtime.Tryarangeofpossibilities, assuggestedbyprevioussearchesonpartialdatasets, andgraduallyfocusinonthosethatseemtogivethebestresults, asmeasuredby(clsf –log-a<X/H> clsf).
—————————————————————————
(defvar *checkpoint-file* nil "When checkpointing is necessary, set this to be
the pathname of your checkpoint file. Otherwise it MUST be nil. Note that the
file type will be overridden by the AutoClass standard types.")
(defun Base-Cycle (clsf &key (stream t) display-wts)
"Special checkpointing version of the standard Update-Wts, Update-Parameters,
and Update-Approximations cycle."
(declare (special *checkpoint-file*))
(UPDATE-WTS clsf)
(let ((n-stored (DELETE-NULL-CLASSES clsf)))
(when (and display-wts (plusp n-stored))
(format stream " & 3D null classes stored from base-cycle." n-stored)))
(UPDATE-PARAMETERS clsf)
(UPDATE-APPROXIMATIONS clsf)
(if display-wts (display-step clsf stream))
(if *checkpoint-file*
(unless (pathnamep *checkpoint-file*)
(break "Checkpointing: reset *checkpoint-file* to a pathname or nil and continue."))
(SAVE-CLSF-SEQ (list clsf) *checkpoint-file* :binary t))
(clsf-
- log - a < X/H > clsf ))(if*checkpoint - file*(SAVE - CLSF - SEQ(listclsf )*checkpoint - file* : binaryt))(clsf –log-a<X/H> clsf)
)
***
I am confused about the interaction between new Base-Cycle and
autoclass-search, find-best-n-4, & <try-fn>: are you saying to use
autoclass-search (with old Base-Cycle) to find minimum number of classes, then
load new Base-Cycle and run either find-best-n-4 or <try-fn>? – I don't
understand the difference between using find-best-n-4 & <try-fn>.